Dynamic Patterns of Public Transport Usage

Author

Wan Kee

Published

November 25, 2023

Dynamic Patterns of Public Transport Usage: A Geospatial Analysis of Bus Stop Passenger Volume in Urban Environments

1.1 Overview

In the intricate mosaic of urban transportation, bus stops serve as pivotal nodes that capture the pulse of city life through the ebb and flow of passenger trips. The study of passenger trip generation, particularly during peak hours, becomes essential for enhancing service efficiency, planning urban infrastructure, and improving the overall commuter experience. This geospatial analysis is anchored in the bustling landscape of a metropolitan area, where data on bus stops, resident population distribution, urban development plans (master plan sub-zones), and passenger volume intertwine to paint a comprehensive picture of transit dynamics.

This analysis aims to dissect the rhythms of urban mobility with the following objectives:

  1. Geovisualization and Analysis

    • Compute the passenger trips generated by origin at the hexagon level,

    • Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,

    • Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).

  2. Local Indicators of Spatial Association (LISA) Analysis

    • Compute LISA of the passengers trips generate by origin at hexagon level. Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05) With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).
  3. Emerging Hot Spot Analysis(EHSA) With reference to the passenger trips by origin at the hexagon level for the four time intervals given above:

    • Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,

    • Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level. The maps should only display the significant (i.e. p-value < 0.05). With reference to the EHSA maps and data visualisation prepared, describe the spatial patterns reveled. (not more than 250 words per cluster).

1.2 Load packages

sf dplyr mapview tmap

pacman::p_load(knitr, mapview, spdep, tmap, tidyverse, sf)

1.3 Import data

busstops contains the detailed information for all bus stops currently being serviced by buses, including bus stop code, road name, description, location coordinates. Source: LTA DataMall (Postman URL)

busstops <- st_read(dsn = "data/BusStopLocation_Jul2023", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/BusStopLocation_Jul2023' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

The output indicates that the geospatial objects are point features. There are 5161 features and 3 fields. It is in SVY21 projected coordinates system with XY dimension.

Geospatial statistics are published by the Singapore Department of Statistics. They are available from the Population Trends, Census of Population and General Household Survey reports. Singapore Residents by Planning Area/Subzone, Age Group, Sex and Floor Area of Residence, June 2011 onwards

PA - Planning Area SZ - Subzone AG - Age Group Sex - Sex FA - Floor Area of Residence Pop - Resident Count Time - Time / Period

  1. For June 2011 to 2019, Planning Areas refer to areas demarcated in the Urban Redevelopment Authority’s Master Plan 2014.
  2. For June 2020, Planning Areas refer to areas demarcated in the Urban Redevelopment Authority’s Master Plan 2019.
  3. Data from 2003 onwards exclude residents who have been away from Singapore for a continuous period of 12 months or longer as at the reference period.
  4. The figures have been rounded to the nearest 10.
  5. The data may not add up due to rounding.

Source: Department of Statistics (Link)

popdata <- read_csv("data/respopagesexfa2011to2020/respopagesexfa2011to2020.csv")

odbus contains the number of trips by weekdays and weekends from origin to destination bus stops. It indicates passenger volume for October 2023. Source: LTA DataMall (Postman URL)

odbus = read_csv("data/origin_destination_bus_202310.csv")

The output does not indicate any geospatial objects. There are 5,694,297 records and 7 fields.

Master Plan 2019 Subzone Boundary

The Master Plan is a forward looking guiding plan for Singapore’s development in the medium term over the next 10 to 15 years published in 2019. Note this mpsz differs from that in previous chapter, Data Wrangling.

hexagon is a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.

Source: URA (Download here)

mpsz = st_read(dsn="data/MPSZ-2019", layer="MPSZ-2019") %>% 
  st_transform(crs=3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/MPSZ-2019' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

The output indicates that the geospatial objects are multipolygon features. There are 332 features and 6 fields. It is in WGS84 projected coordinates system with XY dimension.

# Assuming mpsz is your sf object
#mpsz <- st_read(dsn="data/MP19_SUBZONE", layer="MP19_SUBZONE_NO_SEA")

# Ensure that 'geometry' is the active geometry column
#mpsz <- st_set_geometry(mpsz, "geometry")

# Check for invalid geometries and apply st_make_valid if needed
#mpsz$geometry_is_valid <- st_is_valid(mpsz$geometry)
#mpsz <- mpsz %>%
#  mutate(geometry = ifelse(!geometry_is_valid, st_make_valid(geometry), geometry))

# Now, remove the Z dimension explicitly if it exists
#mpsz <- st_zm(mpsz, drop = TRUE, what = "Z")

# View the data structure to confirm changes
#print(mpsz1)
Warning

Warning: GDAL Message 1: Sub-geometry 0 has coordinate dimension 2, but container has 3

The warning message indicate that a discrepancy in the dimensionality of the geometries in mpsz Shapefile. Some of the sub-geometries have 2D coordinates (X and Y), while the overall container expects 3D coordinates (X, Y, and Z). The st_zm(what = "Z") function drops the Z dimension from the geometries, which eliminate the warnings about coordinate dimensions. We are not performing 3D analysis, this approach should work fine for most applications.

1.4 Create Spatial Grids

Spatial grids are commonly used in spatial analysis to divide the study area into equal size, regular polygons that tessellate the area of interest. On the selection of grid, regular geographic units, such as square grid or fishnets, rarely reflect real world situations. Hexagons are compact shape and can overcome oddly-shaped geographical units.

Step 1: Create a hexgonal grid

# Create hexagonal grid (250m from center to edges)
area_hexagon_grid = st_make_grid(busstops, cellsize = 500, what = "polygons", square = FALSE)
area_hexagon_grid
Geometry set for 5580 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

Step 2: Add Grid ID

hexagon_grid_sf = st_sf(area_hexagon_grid) %>% 
  mutate(grid_id = 1:length(lengths(area_hexagon_grid)))
hexagon_grid_sf
Simple feature collection with 5580 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
                area_hexagon_grid grid_id
1  POLYGON ((3720.122 26626.44...       1
2  POLYGON ((3720.122 27492.46...       2
3  POLYGON ((3720.122 28358.49...       3
4  POLYGON ((3720.122 29224.51...       4
5  POLYGON ((3720.122 30090.54...       5
6  POLYGON ((3720.122 30956.57...       6
7  POLYGON ((3720.122 31822.59...       7
8  POLYGON ((3720.122 32688.62...       8
9  POLYGON ((3720.122 33554.64...       9
10 POLYGON ((3720.122 34420.67...      10

Step 3: Count the number of bus stops in each grid and remove grids without any value.

hexagon_grid_sf$grid_id = lengths(st_intersects(hexagon_grid_sf, busstops))
hexagon_count = filter(hexagon_grid_sf, grid_id>0)

Step 4: Set tmap to view mode for interactive plotting.

tmap_mode("view")
hexagon = tm_shape(hexagon_count)+
  tm_fill(
    col = "grid_id",
    palette = "Reds",
    style = "cont",
    title = "Number of Bus Stops",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      grid_id = list(format = "f", digits = 0)
    )
  )+
  tm_borders(col = "grey40", lwd = 0.7)
hexagon

1.5 Explore data

glimpse(busstops)
Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC   <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry   <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
glimpse(popdata)
Rows: 738,492
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo K…
$ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Kio T…
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to…
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Females", …
$ FA   <chr> "<= 60", ">60 to 80", ">80 to 100", ">100 to 120", ">120", "Not A…
$ Pop  <dbl> 0, 10, 30, 80, 20, 0, 0, 10, 40, 90, 10, 0, 0, 10, 30, 110, 30, 0…
$ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
glimpse(mpsz)
Rows: 332
Columns: 7
$ SUBZONE_N  <chr> "MARINA EAST", "INSTITUTION HILL", "ROBERTSON QUAY", "JURON…
$ SUBZONE_C  <chr> "MESZ01", "RVSZ05", "SRSZ01", "WISZ01", "MUSZ02", "MPSZ05",…
$ PLN_AREA_N <chr> "MARINA EAST", "RIVER VALLEY", "SINGAPORE RIVER", "WESTERN …
$ PLN_AREA_C <chr> "ME", "RV", "SR", "WI", "MU", "MP", "WI", "WI", "SI", "SI",…
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "WEST…
$ REGION_C   <chr> "CR", "CR", "CR", "WR", "CR", "CR", "WR", "WR", "CR", "CR",…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((33222.98 29..., MULTIPOLYGON (…

1.5 Plot data

location = mapview(busstops, cex = 3, alpha = 0.5, popup = NULL)
location
popdata2020 <- popdata %>%
  filter(Time == 2020) %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup() %>%
  pivot_wider(names_from=AG, values_from=POP) %>%
  mutate(YOUNG = rowSums(.[3:6]) + rowSums(.[12])) %>%
  mutate(`ECONOMY ACTIVE` = rowSums(.[7:11]) + rowSums(.[13:15]))  %>%
  mutate(`AGED`=rowSums(.[16:21])) %>%
  mutate(`TOTAL`=rowSums(.[3:21])) %>%
  mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)/`ECONOMY ACTIVE`) %>%
  select(`PA`, `SZ`, `YOUNG`, `ECONOMY ACTIVE`, `AGED`, `TOTAL`, `DEPENDENCY`)

1.6 Geovisualisation and Analysis

1.6.1 Compute the bus stop density

# Filter hexagons that contain more than 8 bus stops
hexagon_red = filter(hexagon_grid_sf, grid_id>8)

tmap_mode("view")
redhex = tm_shape(hexagon_red)+
  tm_fill(
    col = "grid_id",
    palette = "Reds",
    style = "cont",
    title = "Number of Bus Stops",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      grid_id = list(format = "f", digits = 0)
    )
  )+
  tm_borders(col = "grey40", lwd = 0.7)
redhex

Sembawang MRT (9 bus stops)

Pasir Ris (11 bus stops)

1.6.2 Compute the passenger trips generated by origin

Step 1:

Peak hour period Bus tap on time Weekday morning peak 6am to 9am Weekday afternoon peak 5pm to 8pm Weekend/holiday morning peak 11am to 2pm Weekend/holiday evening peak 4pm to 7pm

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)

Step 1:

passengertrips <- left_join(busstops, odbus, 
                            by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))

Step 2:

passengertrips_grouped <- passengertrips %>%
  group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
  summarise(
    WEEKDAY_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY"], na.rm = TRUE),
    .groups = "drop"
  )

passengertrips_grouped <- passengertrips_grouped %>% 
  filter(!(WEEKDAY_TRIPS == 0 & WEEKENDS_HOLIDAYS_TRIPS == 0))

Step 3: Ensure both datasets are in the same coordinate reference system (CRS).

st_crs(passengertrips_grouped)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(hexagon_grid_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Step 4: Perform a spatial join to match trips to hexagons

passengergrid <- st_join(hexagon_grid_sf, passengertrips, join = st_intersects)
# Remove rows with no trips (NA values)
passengergrid_clean <- passengergrid %>% 
  filter(!is.na(TOTAL_TRIPS))

Step 5: Split passengers trip into weekday and weekend

# Subset for Weekday
weekday_trips <- passengergrid_clean %>%
  filter(DAY_TYPE == "WEEKDAY") %>% 
  group_by(BUS_STOP_N) %>%
  summarise(
    Total_Trips = sum(TOTAL_TRIPS, na.rm = TRUE),
  )

# Subset for Weekend
weekend_trips <- passengergrid_clean %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>% 
  group_by(BUS_STOP_N) %>%
  summarise(
    Total_Trips = sum(TOTAL_TRIPS, na.rm = TRUE),
  )
# Convert your data to an sf object if it's not one already
weekday_sf <- st_as_sf(weekday_trips, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Now, to plot the data with specified breaks:
tm_shape(weekday_sf) +
  tm_polygons("Total_Trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekday Trips") +
  tm_layout(main.title = "Weekday Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Print the map
tmap_mode("view")
tm_shape(weekday_sf) +
  tm_polygons("Total_Trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "Total Trips") +
  tm_scale_bar()+
  tm_compass(type = "8star", size = 2) +
  tm_layout(main.title = "Total Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekend_sf <- st_as_sf(weekend_trips, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Now, to plot the data with specified breaks:
tm_shape(weekend_sf) +
  tm_polygons("Total_Trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekday Trips") +
  tm_layout(main.title = "Weekday Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Print the map
tmap_mode("view")
tm_shape(weekend_sf) +
  tm_polygons("Total_Trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Total Trips") +
  tm_scale_bar()+
  tm_compass(type = "8star", size = 2) +
  tm_layout(main.title = "Total Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)

1.6 Local Indicators of Spatial Association (LISA) Analysis

1.7 Emerging Hot Spot Analysis (EHSA)

weekday evening peak passenger trips by origin; hexagons dont not have shared boundary; isolated with no neighbours but a hotspot.